Applying Topic Modeling to Forensic Data
نویسندگان
چکیده
Most actionable evidence is identified during the analysis phase of digital forensic investigations. Currently, the analysis phase uses expressionbased searches, which assume a good understanding of the evidence; but latent evidence cannot be found using such methods. Knowledge discovery and data mining (KDD) techniques can significantly enhance the analysis process. A promising KDD technique is topic modeling, which infers the underlying semantic context of text and summarizes the text using topics described by words. This paper investigates the application of topic modeling to forensic data and its ability to contribute to the analysis phase. Also, it highlights the challenges that forensic data poses to topic modeling algorithms and reports on the lessons learned from a case study.
منابع مشابه
Applying Graph Theory to Modeling Investigations
This paper presents a methodology for applying the elements of graph theory to modeling forensic investigations. This methodology uses well established principles of graph theory to model any forensic investigation and thus mathematically evaluate the elements of a case, including the probabilities associated with specific suspects
متن کاملApplying Topic Modelling on Forensic Data: a Case Study
Most actionable evidence for investigation purposes is identified during the analysis phase of a digital investigation process. The objective of the analysis phase (digital analysis) is to reduce the quantity and enhance the intelligibility of data that must be reviewed by a human analyst. Currently, this is done through expression based searching, which assumes a good understanding of the evid...
متن کاملAn Adaptation of Topic Modeling to Sentences
Advances in topic modeling have yielded effective methods for characterizing the latent semantics of textual data. However, applying standard topic modeling approaches to sentence-level tasks introduces a number of challenges. In this paper, we adapt the approach of latent-Dirichlet allocation to include an additional layer for incorporating information about the sentence boundaries in document...
متن کاملContextual Modeling for Meeting Translation Using Unsupervised Word Sense Disambiguation
In this paper we investigate the challenges of applying statistical machine translation to meeting conversations, with a particular view towards analyzing the importance of modeling contextual factors such as the larger discourse context and topic/domain information on translation performance. We describe the collection of a small corpus of parallel meeting data, the development of a statistica...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کامل